AITopics | annotation layer

Collaborating Authors

annotation layer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PyTorch-IE: Fast and Reproducible Prototyping for Information Extraction

Binder, Arne, Hennig, Leonhard, Alt, Christoph

arXiv.org Artificial IntelligenceMay-16-2024

The objective of Information Extraction (IE) is to derive structured representations from unstructured or semi-structured documents. However, developing IE models is complex due to the need of integrating several subtasks. Additionally, representation of data among varied tasks and transforming datasets into task-specific model inputs presents further challenges. To streamline this undertaking for researchers, we introduce PyTorch-IE, a deep-learning-based framework uniquely designed to enable swift, reproducible, and reusable implementations of IE models. PyTorch-IE offers a flexible data model capable of creating complex data structures by integrating interdependent layers of annotations derived from various data types, like plain text or semi-structured text, and even images. We propose task modules to decouple the concerns of data representation and model-specific representations, thereby fostering greater flexibility and reusability of code. PyTorch-IE also extends support for widely used libraries such as PyTorch-Lightning for training, HuggingFace datasets for dataset reading, and Hydra for experiment configuration. Supplementary libraries and GitHub templates for the easy setup of new projects are also provided. By ensuring functionality and versatility, PyTorch-IE provides vital support to the research community engaged in Information Extraction.

annotation, computational linguistic, pytorch-ie, (11 more...)

arXiv.org Artificial Intelligence

2406.00007

Country:

Europe > United Kingdom (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre:

Overview (0.47)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Opera Graeca Adnotata: Building a 34M+ Token Multilayer Corpus for Ancient Greek

Celano, Giuseppe G. A.

arXiv.org Artificial IntelligenceMar-31-2024

In this article, the beta version 0.1.0 of Opera Graeca Adnotata (OGA), the largest open-access multilayer corpus for Ancient Greek (AG) is presented. OGA consists of 1,687 literary works and 34M+ tokens coming from the PerseusDL and OpenGreekAndLatin GitHub repositories, which host AG texts ranging from about 800 BCE to about 250 CE. The texts have been enriched with seven annotation layers: (i) tokenization layer; (ii) sentence segmentation layer; (iii) lemmatization layer; (iv) morphological layer; (v) dependency layer; (vi) dependency function layer; (vii) Canonical Text Services (CTS) citation layer. The creation of each layer is described by highlighting the main technical and annotation-related issues encountered. Tokenization, sentence segmentation, and CTS citation are performed by rule-based algorithms, while morphosyntactic annotation is the output of the COMBO parser trained on the data of the Ancient Greek Dependency Treebank. For the sake of scalability and reusability, the corpus is released in the standoff formats PAULA XML and its offspring LAULA XML.

annotation, annotation layer, section 4, (16 more...)

arXiv.org Artificial Intelligence

2404.00739

Country:

Europe > Czechia > Prague (0.05)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > Poland > Greater Poland Province > Poznań (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Add feedback

Homonymy Information for English WordNet

Maudslay, Rowan Hall, Teufel, Simone

arXiv.org Artificial IntelligenceDec-16-2022

A widely acknowledged shortcoming of WordNet is that it lacks a distinction between word meanings which are systematically related (polysemy), and those which are coincidental (homonymy). Several previous works have attempted to fill this gap, by inferring this information using computational methods. We revisit this task, and exploit recent advances in language modelling to synthesise homonymy annotation for Princeton WordNet. Previous approaches treat the problem using clustering methods; by contrast, our method works by linking WordNet to the Oxford English Dictionary, which contains the information we need. To perform this alignment, we pair definitions based on their proximity in an embedding space produced by a Transformer model. Despite the simplicity of this approach, our best model attains an F1 of .97 on an evaluation set that we annotate. The outcome of our work is a high-quality homonymy annotation layer for Princeton WordNet, which we release.

artificial intelligence, natural language, text processing, (16 more...)

arXiv.org Artificial Intelligence

2212.08388

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(14 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback